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Abstract 

We prove identifiabihty of parameters for a broad class of random graph mixture models. 
These models are characterized by a partition of the set of graph nodes into latent (un- 
observable) groups. The connectivities between nodes are independent random variables 
when conditioned on the groups of the nodes being connected. In the binary random 
graph case, in which edges are either present or absent, these models are known as 
stochastic blockmodels and have been widely used in the social sciences and, more re- 
cently, in biology. Their generalizations to weighted random graphs, either in parametric 
or non-parametric form, are also of interest in many areas. Despite a broad range of ap- 
plications, the parameter identifiability issue for such models is involved, and previously 
has only been touched upon in the literature. We give here a thorough investigation of 
this problem. Our work also has consequences for parameter estimation. In particular, 
the estimation procedure proposed by Frank and Harary for binary affiliation models is 
revisited in this article. 
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1. Introduction 

In modern statistical analyses, data is often structured using networks. Complex 
networks appear across many fields of science, including biology (metabolic networks, 
transcriptional regulatory networks, protein-protein interaction networks), sociology (so- 
cial networks of acquaintance, or other connections between individuals), communications 
(the Internet), and others. 

The literature contains many random graph models which incorporate a variety of 
characteristics of real- world graphs (such as scale-free or small- world properties). We 
refer to Newman (2003) and the references therein for an interesting introduction to 
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networks. 



One of the earliest and most studied random graph models was formulated by Erdos 
and Rcnyi (1959). In this setup, binary random graphs are modeled as a set of indepen- 
dent and identically distributed Bernoulli edge variables over a fixed set of nodes. The 
homogeneity of this model led to the introduction of mixture versions to better capture 
heterogeneity in data. Stochastic blockmodels (Daudin et al., 2008; Frank and Harary, 
1982; Holland ct al., 1983; Snijders and Nowicki, 1997) were introduced in various forms, 
primarily in the social sciences (White et al., 1976) to study relational data, and more 
recently in biology (Picard et al., 2009). In this context, the nodes are partitioned into 
latent groups (blocks) characterizing the relations between nodes. Blockmodelling thus 
refers to the particular structure of the adjacency matrix of the graph (i.e., the matrix 
containing edge indicators). By ordering the nodes by the groups to which they belong, 
this matrix exhibits a block pattern. Diagonal and off-diagonal blocks, respectively, rep- 
resent intra-group and inter-group connections. In the special case where blocks exhibit 
the same behavior within their type (diagonal or off-diagonal), we obtain a model with 
an affiliation structure (Frank and Harary, 1982). 

Although the literature from the social sciences has focused mostly on binary rela- 
tions, there is a growing interest in weighted graphs (Barrat ct al., 2004; Newman, 2004). 
Mixture models have also been considered in the case of a finite number of possible re- 
lations (Nowicki and Snijders, 2001), and more recently with continuous edge variables 
(Ambroise and Matias, 2010; Mariadassou and Robin, 2010). Some variations that we 
shall not discuss here include models with covariates (Tallberg, 2005), mixed membership 
models (Airoldi et al., 2008; Latouche et al., 2009), and models with continuous latent 
variables (Daudin et al., 2010; Handcock et al., 2007). We also note that Newman and 
Leicht (2007) proposed another version of a binary mixture model, slightly different from 
the stochastic blockmodel considered here. 

Many different parameter estimation procedures have been proposed for these models, 
such as Bayesian methods (Nowicki and Snijders, 2001; Snijders and Nowicki, 1997), vari- 
ational Expectation-Maximization (EM) procedures (Daudin et al., 2008; Picard et al., 
2009), online classification EM methods (Zanghi et al., 2008, 2010) and more recently, 
direct mixture model based approaches (Ambroise and Matias, 2010). Consistency of all 
these procedures relies strongly on the identifiability of the model parameters. However, 
the literature on these models has not addressed this question in any depth. The trivial 
label-swapping problem is often mentioned: it is well known that the parameters may be 
recovered only up to permutations on the latent class labels. Whether this is the only 
issue preventing unique identification of parameters from the distribution, however, has 
never been investigated. Given the complex form of the model parameterization, this is 
not surprising, as any such analysis seems likely to be very involved. 

In earlier work, (Allman et al., 2009, Theorem 7), the authors made a first step 
towards an understanding of the parameter identifiability issue in binary random graph 
mixture models. While that article addressed a variety of models with latent variables, 
the present one focuses more specifically on random graph mixtures, giving parameter 
identifiability results for a broad range of such models. Moreover, part of our work sheds 
some new light on parameter estimation procedures. 

Allman et al. (2009) emphasized the usefulness of an algebraic theorem due to Kruskal 
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(1976, 1977) (see also Rhodes, 2010) to establish identifiability results in various models 
whose common feature is the presence of latent groups and at least three conditionally 
independent variables. Here, we rather focus on the family of random graph mixture 
models and explore various techniques to establish their parameters' identifiability. Thus 
while the method developed by AUman et al. (2009) is presented in Section 5.1 and finds 
further use in several arguments, it is only one of several techniques we use. The issue 
at the core of Kruskal's result is the decomposition of a 3-way array as a sum of rank 
one tensors. While there exist approximate methods of performing this decomposition 
(see, e.g., Tomasi and Bro, 2006), we mention that this approach seems poorly-suited to 
explicitly recover the parameters from the distribution, and thus to construct estimation 
procedures. 

Some of our results focus on moment equations, as did those of Frank and Harary 
(1982), in one of the earliest works on binary affiliation models. In particular, we revisit 
some of their claims. The method consists in looking at the distribution of Kn, a complete 
set of edge variables over a set of n nodes. A natural question is then: What is the minimal 
value of n such that the complete distribution over all edge variables (a potentially infinite 
set) is characterized by the distribution of A'„? Despite this question's simplicity, we are 
far from having a complete answer to it. When looking at finite state distributions {e.g., 
for binary random graphs), the knowledge of the distribution of Kn is equivalent to the 
knowledge of a certain set of moments of the distribution. Expressing the moments in 
terms of parameters gives a nonlinear polynomial system of equations, which one uses to 
identify parameters. The uniqueness of solutions to those systems, up to label swapping 
on parameters, is the issue at stake for identifiability. 

For random graphs with continuous edge weights given by a parametric family of 
distributions we shall see that the information contained in the model might be recovered 
from the distribution of Kn for very small values of n. In this case, we rely on classical 
results on the identifiability of the parameters of a multivariate mixture due to Teicher 
(1967). Note that the main difference between classical mixtures and random graph 
mixtures is the non- independence of the variates. 

In contrast to the approach based on Kruskal's Theorem, both the method utilizing 
moment equations and the one relying on multivariate mixtures lead to practical estima- 
tion procedures. These are further developed by Ambroise and Matias (2010). 

In Allman et al. (2009), a large role was played by the notion of generic identifiabil- 
ity, by which every parameter except those lying on a proper algebraic subvariety, are 
identifiable. In other words, in a parametric setting, the non-identifiable parameters are 
included in a subset whose dimension is strictly smaller than the dimension of the full 
parameter space. Thus with probability one with respect to the Lebesgue measure, ev- 
ery parameter is identifiable. This notion of generic identifiability is important for finite 
mixtures of multivariate Bernoulli distributions (Allman et al., 2009; Carreira-Pcrpinan 
and Renals, 2000; Gyllenberg et al., 1994) and also for hidden Markov models (Allman 
et al., 2009; Pctrie, 1969). Here, we stress that some of our identifiability results are 
generic, while others are strict. 

Finally, we note that our focus throughout will be on undirected graph models. While 
many of our results may be generalized to directed graphs, one must pay careful atten- 
tion to the models' parametrization in doing so. For instance, some of the results would 
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become simpler if the connectivities from group q to group / differed from those from 
group / to group g, as symmetry in a model can have a strong impact on identifiability 
questions. However, such asymmetric models require an increase in the number of pa- 
rameters which may be excessive for data analysis. 

This paper is organized as follows. Section 2 presents the various random graph mix- 
ture models: with either binary or, more generally, finite-state edges; both parametric 
and non-parametric models for edges with continuous weights; and the particular affilia- 
tion variant of these models. Section 3 gives parameter identifiability results for binary 
random graphs. Note that the affiliation model has to be handled separately. Section 4 
takes up weighted random graphs, in both parametric and non-parametric variants. All 
the proofs are postponed to Section 5. In particular, Section 5.1 is devoted to a brief 
presentation of Kruskal's result and our use of it in the proofs of Theorems 2 and 14. 

2. Notation and models 

We consider a probabilistic model on undirected and possibly weighted graphs as 
follows. Let n be a fixed number of nodes, with Zi,...,Z„ independent identically 
distributed (i.i.d.) random variables, taking values m Z — {1, . . . ,Q} for some Q > 2. 
These random variables represent the Q groups the nodes are partitioned among, and 
are used to introduce heterogeneity in the model. With tt^ = P(Zi = q) ^ (0,1), so 
'^qi^q — 1, the vector tt = (tt^) thus gives the priors on the groups. Let {v'Cij}i<i<j<„ 
be random edge variables taking values in a state space X . Conditional on Zi, . . . , Z„, we 
assume that the edge variables {Xy }i<i<j<„ are independent, and that the conditional 
distribution of Xij depends only on Zi and Zj, the groups containing its endpoints. 

We are interested in random graphs of various types: For binary random graphs, 
where X = {0, 1}, an absent edge is represented by and a present one by 1. Random 
graphs whose edges may be of finitely many types are modeled with X — {1, . . . , k}, or 
equivalently, {0, . . . , k — 1}. More general weighted random graphs are obtained when 
A" = N or M^s > 1. 

In the binary state case, the distribution of Xij conditional on Zi,Z.j follows a 
Bernoulli distribution with parameter pZiZj = ^{^ij = MZi, Zj). As we consider only 
undirected graphs, we implicitly assume equality of the parameters Pqi = piq, for all 
l<q,l<Q. 

More generally, in the finite state case, with X — {1,...,k}, the vector pztZj — 
{pZiZj (1), ■ ■ • ,PZiZj (k)) contains the values pz^Zj {k) = V{Xij = k\Z^, Zj), for 1 < fc < k, 
with '^kPZiZjik) = 1. We also implicitly assume equality of the vectors p^; = piq, for 
all 1 < q,l < Q. We introduce this model primarily as a tool in the study of continuously 
weighted random graphs, though it might be useful for studying relationships between 
nodes of different types (colors) , or of varying but discrete strengths (viewing the states 
as ordered). Note that a related model is described by Nowicki and Snijders (2001), 
where the authors consider more general relation types (not necessarily edges, whether 
directed or not) occurring between a pair of nodes. 

In the weighted random graph case, edges may be viewed as either absent {Xij = 0) 
or present {Xij ^ 0), with those present having a weighty namely a non-zero value in 
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= N, M, or M". The distribution of Xij conditional on Zi,Zj may be assumed to 
have either a parametric or non-parametric form. More precisely, we assume that the 
distribution of Xij conditional on Zi, Zj is the probability measure /Uz^.z^ on X given by 

f^ql = - Pql)^0 + PqlFqh ^<qJ<Q, 

where Pqi G (0, 1] is a sparsity parameter, So is the Dirac mass at zero and Fqi is a 
probability measure on X with density fqi with respect to either the counting measure 
on N or the Lebesgue measure on M or M*. Wc also implicitly assume iiqi — fiiq, for 
all 1 < ^ < Q. In the parametric case, we assume moreover that Fqi = F{-,9qi) and 
fqi = f{-,Oqi) where the parameter 9qi belongs to © C ffi^. In the non-parametric case 
we assume Fqi is absolutely continuous. 

We shall always assume that Fqi has no point mass at zero, otherwise the sparsity 
parameter Pqi cannot be identified from the mixture /Ugj. For instance, when considering 
Poisson weights, fqi is the Poisson density truncated at zero, 

/«/(fc) = f (e'"-l)-^ k>l. 

A particular instance of these models is the affiliation one, which assumes additionally 
only two distributions of connections between the edges, one for intra-group connections 
and another for inter-group connections. Thus the binary state case of the affiliation 
model assumes 

Pqi = \Z ^!^/!' for all g,/ e {!,..., Q}. 

The affiliation model in the continuous observations case is described similarly with 
Mqf = Minlg=i+Moutlq5<i/, for all 1 < i < Q. Movs precisely, in the continuous parametric 
case, for all q,l & {1,. . . ,Q} we set 



Pqi 



a if g = , „ J 6lin if 9 = I, 

and Uni = < 



For all these models, we consider restrictions of the model distribution by focusing on 
a subset of the nodes. We denote by Kn the complete set of (2) edge variables associated 
to a subset of n nodes. Note that the distribution of these variables is independent of 
the choice of which n nodes one considers. Also, while this notation is motivated by that 
used in graph theory, where Kn denotes the complete graph on n nodes, we emphasize 
that here Kn is a set of random variables, and wc are making no statement as to whether 
these edges are present or absent in any realization of our model. 



3. Binary random graphs 

We first focus on models with binary edge states, considering the more general case 
with arbitrary connectivity parameters, followed by affiliation models. 
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3.1. The non-affiliation case 

When X = {0, 1}, a first result on identifiability of parameters was obtained by 
Allman et al. (2009) for the special case oi Q = 2 groups. For completeness, we recall 
the statement here. 

Theorem 1. (Allman et al., 2009, Theorem!). The parameters Tri,TT2 = l—TTi,pii,pi2,P22 
of the random graph mixture model with binary edge state variables and Q = 2 groups 
are identifiable, up to label swapping, from the distribution of Kiq provided that the con- 
nectivity parameters {pii,Pi2iP22} cire distinct. 

In particular, the result remains valid when the group proportions TTq are fixed. 

Note the assumption that pn P22 limits this theorem to the strict non-affiliation 
case. 

The proof of this theorem is based on a clever application of an algebraic result, due 
to Kruskal (1976, 1977) (see also Rhodes, 2010), that deals with decompositions of 3- way 
arrays. While generalizing the proof to more than two groups requires substantially more 
effort, the basic method still applies. Here we prove the following theorem. 

Theorem 2. The parameters TTq, 1 < q < Q, and Pqi — P{Xij = l\Zi — q,Zj = I), 
^ l£ q l£ I l£ Q, of the random graph mixture model with binary edge state variables and 
Q > 3 groups are generically identifiable, up to label swapping, from the distribution of 
when 



Moreover, the result remains valid when the group proportions TTq are fixed. 

Note that the stated number of nodes ensuring that parameters are generically identi- 
fiable from the distribution of the edges may not be optimal. In particular, when Q = 2, 
the proof of this theorem is still valid, yet it gives a minimal number of = 25 nodes. 
This is larger than the bound 16 obtained in Theorem 1, and that number may itself not 
be optimal. 

Also, while Theorem 1 gives exact restrictions on parameters producing identifiability. 
Theorem 2 is not explicit about the generic conditions. However, for any fixed Q the 
argument in our proof does yield a straightforward, though perhaps lengthy, means of 
checking whether a particular choice of parameters meets the conditions. Among these 
is a requirement that the pqi be distinct, so the theorem does not apply to the affiliation 
model. 

Moreover, a careful reading of the proof of the theorem shows that its generic aspect 
concerns only the part of the parameter space with the connectivities pqi. This enables 
us to conclude that even when considering subsets defined by restriction of the group 
proportions TTq (for instance assuming the group proportions are fixed, or equal), the 
result remains valid. 

3.2. The affiliation model 

In the particular case of the affiliation model, we can obtain results from arguments 
based on moments of the distribution. For a small number of nodes, one may obtain 
explicit formulas for the moments in terms of model parameters. By analyzing the 
solutions to this nonlinear multivariate polynomial system of equations, one can address 
the question of parameter identifiability, as well as develop estimation procedures. 
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3.2.1. Relying on the distribution of K^. 

Frank and Harary (1982) presented a method for estimation of the parameters of the 
binary afhhation model based only on the distribution of triplet cycles {Xij,Xjk,Xki), 
l<i<j<k<n^ of edge variables. From an identifiability perspective, this corresponds 
to identifying the parameters from the distribution of . They suggest estimation of the 
parameters by solving the empirical moment equations. However, they omit discussing 
uniqueness of the solutions to these equations, even though this issue is a delicate one 
for nonlinear equations. 

In the following, we first explore the use of the distribution of only to identify 
model parameters. As a consequence, we exhibit a new estimation procedure for the 
parameters. 

The distribution of a triplet (X^-, Xj^, X^j), is expressible in terms of the indetermi- 
nates a, /3 and tt^s. Let us denote by S2 and S3 the sums of the squares and cubes of the 
TTqS and, more generally, let 

Q 

9=1 

Then one easily computes (see also Frank and Harary, 1982) the moment formulas 

mi = E(X,,) = S2a + (l-S2)/3, (1) 
7712 = E(X,,X,fc) -S3a2 + 2(s2-S3)a/3+(l-2s2 + S3)/3', (2) 
7773 = E(X,,X,fcX,fe) = S3a3 + 3(s2-S3)a/?2 + (l-3s2+2s3)/3^ (3) 

which completely characterize the distribution of {Xij, Xjk,Xki). 

Note that in the important case of a uniform node distribution, where tt^ ~ 1/Q for 
all q, we have Sk = Q^~''. This implies S3 = S2, and hence m2 — mf, so these equations 
reduce to two independent ones. As a consequence, the claim by Frank and Harary 
(1982) that it is then possible to estimate the three unknowns Q,a,/3 relying only on 
these moment equations is not correct. 

Still, there are indeed several situations in which parameters are identifiable from 
these moments, as we next discuss. 

With Q = 2 latent groups and a possibly non-uniform group distribution, there are 3 
independent parameters in the afhliation model. In this case, the three moments above 
are enough to identify parameters. To show this, we first construct certain polynomials 
with roots at the connectivity parameters. Since the construction easily extends to larger 
Q, we give it more generally. 

Proposition 3. Consider the random graph affiliation mixture model with Q > 2 groups 
and binary edge state variables, on Q + I nodes. Then the parameter a is a real root of 
the degree {^2^) univariate polynomial 

Uq{X)=eI H (X-X,,)j. 

\l<i<j<Q+l J 
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The polynomial 



Vq{x,y)^e(^(^x + {q-i)y- ^ ^.(q+d) n (^-^«^)) 

l<i<Q l<i<j<Q 

of degree +1 in X, and degree 1 in Y, vanishes at {X,Y) = Moreover, the 

coefficient ofY in VQ{a,Y) is non-zero precisely when a /3. 

The utility of these polynomials is that from the distribution of i^Q+i , the polynomial 
Uq allows one to recover at most {^2^) candidate values for a, and then for each such 
value Vq allows one to recover a unique candidate for f3. While some of these candidates 
could be ruled out as not lying in (0, 1), we do not know when this leaves a unique a 
and /3 for Q > 3. In the case of Q = 2 groups, however, we prove that these polynomials 
uniquely identify the parameters. 

Theorem 4. In the random graph affiliation mixture model with Q = 2 groups and 
binary edge state variables, the parameter a is the unique real root of the polynomial 

U2{X) = X^ - SmiX^ + 2,m2X ~ mg. 

Moreover, as soon as a ^ fi, the parameter (3 is the unique real root of the polynomial 
V2(a, Y) where 

V2{X, Y)=X^ +XY - 3miX - miY + 2to2. 

Once a and (3 are uniquely identified, we may determine from equation (1) the value 
of S2 (again using that a ^ and hence tti, tt2, up to permutation. This proves the 
following corollary. 

Corollary 5. The parameters {7ri,7r2 = 1 — tti}, up to label swapping, and a, j3 of the 
random graph affiliation mixture model with Q — 2 groups and binary edge state variables 
are strictly identifiable from the distribution of provided a ^ (3. 

Identifiability of a and {3 when Q and the tt^s are known. When the tt^s are known, 
Frank and Harary (1982) suggested solving any two of the three empirical counterparts 
of equations (1), (2) and (3), leading to three different methods of estimating a and 
(3. However, numerical experiments convinced us that two equations are in general not 
sufficient to uniquely determine the parameters. In fact, it is not immediately clear 
that even with the three moment equations (either the theoretical ones for the question 
of identification, or their empirical counterparts for estimation) a unique solution is 
determined. Below we give explicit formulas for the solution to the system, which in 
most cases are even rational, involving no extraction of roots. These can thus be easily 
used to construct estimators. 

Theorem 6. If m2 7^ m\, then n is non-uniform and we can recover the parameters 13 
and a via the rational formulas 

p _ (53 - S2S3,)m\ + {si - S3)m2mi + {S3S2 - spms 
(mf - m2)(2s^ - 3S3S2 + S3) 

mi + (S2 - 1)/? 
a — . 



If = ™' 



then TT is uniform and we have 




) 



1/3 



/3 = mi + 



and a = Qmi + (1 — Q)/?. 



Implicit in this statement is the fact that denominators in the above formulas are non- 
zero. Note that the uniform group prior case formula is used for estimation by Ambroise 
and Matias (2010). 

We immediately obtain the following corollary. 

Corollary 7. For any fixed and known values ofnq G (0, 1), 1 < 9 < Q, both parameters 
a, /3 of the random graph affiliation model with binary edge state variables are identifiable 
from the distribution of K3 . 

The proofs of the previous statements lead to an interesting polynomial in the mo- 
ments, whose vanishing detects the Erdos-Renyi model, corresponding to a single node 
group. 

Proposition 8. The moments of a random graph affiliation model with binary edge state 
variables, Q node states, and a ^ (3 satisfy 



if, and only if, Q = I. 

This proposition follows from expressing the moments in terms of parameters to see 
that 



together with the determination in the proof of Lemma 19 in Section 5.3 that — 
3S2S3 + S3 7^ when tt, > for more than one group q. 

3. 2. 2. Relying on the distribution of K4 

We next investigate parameter identifiability from the distribution of the edge vari- 
ables over more than 3 nodes, paying particular attention to the case of n = 4 nodes. 

Necessary conditions for identifiability of the iTqS, when Q is known. First, we establish 
that for an affiliation model, if the tt^s are unknown and are to be recovered from the 
distribution of Kn, then one must look at at least n = Q nodes. Note that this applies 
not only to the binary edge state model, but to more general weighted edge models as 
well. 

Proposition 9. In order to identify, up to label swapping, the parameters {7rg}i<q<Q 
from an affiliation random graph mixture distribution on Kn (either binary or weighted), 
it is necessary that n > Q. 

The condition in this lemma is in general not sufficient to identify the tt^. Indeed, 
the binary edge state affiliation model with Q — 3 has 4 parameters. However, the set of 
distributions over has dimension at most 3 (according to equations (1),(2) and (3)), 
which is not sufficient to identify the 4 parameters. 



2ml 



3mim2 + m3 = 



2ml ~ 3mim2 + — {a — (3)^{2sl — 3s2S3 -I- S3), 
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mi 


E(Xi2) 


S2a + (1 - S2)(3 


1712 


JtHAi2Ai3j 


s^a + zap[S2 — s^) + [L — IS2 + S3JP 




E(Xi2Xl3X23) 


saa^ + 3(S2 - s^)a^' + (1 - 853 + 2s^)^^ 


"^32 


JtHAi2Ai3Ai4j 


S4Q; + 0(^53 — S^)OL p + O^S2 — + 54jQ;p 
+ (1 - 3S2 + 3S3 - S4),33 




E(Xi2X23X34) 


S4a^ + (s^ + 2s3 - 3s4)a^/3 + (3s2 - - 4s3 + 3s4)a/3^ 
+(1 - 3s2 + si + 2S3 - 54)/?^ 


m4i 


E(Xi2^23 
-''^34 -^^41) 


540'' + 2(s^ + 2s3 - 3s4)a^/3^ + 4(s2 - - 2s3 + 2s4)a/3^ 
+ (1 - 4S2 + 2s2 + 4S3 - 3s4)/34 


17142 


E(Xi2Xi3 
-''^14^23) 


s^a^ + (S3 - S4)a^/3 + {s\ + 2s3 - ■?,S4)o?0^ 

+(4s2 - 2s2 - + 5s4)a/3^ + (1 - 4s2 + + 4s3 - 2s4)/34 


ms 


£(^12X23X34 
-'^41 -^^13) 


S4a^ + 2(s3 - S4)a^/3^ + (2s3 - 4s4 + 2s|)a^/3^ 

+(5s2 - 4s2 - 1OS3 + 9s4)a/3'' + (1 - 5s2 + 2si + 6S3 - 454)/?^ 


mg 


E(Xi2X23X34 
-^41-'^13-''^24) 


S4a« + 4(s3 - S4)a^/3^ + 3(s^ - S4)a^/34 

+6(s2 -si- 2s3 + 2s4)a/35 + (1 - 6s2 + Ssg - 654 + 3s2)/36 



Table 1: Moment formulas describing the distribution of K^, the complete graph on 4 nodes, for the 
binary affiliation model. 



Distribution on K^^. The moment formulas describing the distribution of the affihation 
random graph mixture model on K/^ are given in Table 1. Note that m3i is the same 
as m3 in the last subsection, and that we omit E(Xi2X34) — (E(Xi2))^ since edge 
variables with no endpoints in common are independent. To facilitate understanding of 
the moments in the table, their corresponding induced motifs are shown in Figure 1. 



m2 '"31 m^2 '"5 




Figure 1: Correspondence between moments and motifs for K^. 

With Q arbitrary, but a uniform prior on the nodes (tt^ — 1/Q, so Si = there 
are algebraic relationships between the moments on ^4, including 

m2=m\, TO32 = TO33 = mj, TO42 = mimsi, 

and more complicated ones that can be computed using Grobner basis methods to elimi- 
nate a, /3, and 1/Q from the equations. (Cox et al., 1997, provide an excellent grounding 
on this computational algebra.) However, the 3 parameters a, /3, Q of this affiliation 
model are, in fact, identifiable. Indeed such calculations show that the formulas for mi, 
m3i, and TO41 alone imply the following. 

Proposition 10. The number of node groups, Q, in a random graph affiliation model 
with binary edge state variables and uniform group priors can be identified from the 
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moments mi, m^i, and 77141 by 

^ —rn^i — mil — 3m4imf + ?>m\im\ — Qm^rn^i + Am\m^i + 4mim3i 

Note that, replacing the moments with empirical estimators, this formula could be 
used for estimation of Q. 

Of course once the formula in Proposition 10 is given, it can be most easily verified 
by expressing the moments in terms of parameters, and simplifying. Note that the 
denominator here does not vanish, as may be seen in two different ways: either by 
Lemma 20 in Section 5.3, or by checking that that 

m4i-mt = (a-/?rfc^^0. 

Once Q is identified by this formula, since we are assuming tt^ = l/Q, Corollary 7 
applies so that a and /3 are identifiable as well. Thus we have shown the following. 

Corollary 11. The parameters a, /3, and Q of the random graph affiliation mixture 
model with binary edge state variables and uniform groups priors (iTq — l/Q) are identi- 
fiable from the distribution of K4 . 

4. Weighted random graphs 

4-.1. The parametric case 

In the parametric case, where Fqi has parametric form F{-,9qi), we can uniquely 
identify the connectivity parameters under very general conditions by considering the 
distribution of K3 only. Indeed, each triplet {Xij, Xik, Xjk) follows a mixture of 
distributions, each with three variates, comprising 

• Q terms of the form ^J.qq{Xij)^gg{Xif^)^gq{Xjk), each with prior tt^, where 1 < q < 

g, 

• 3Q{Q— 1) terms of the form fJ.qq{Xij)^qi{Xik)^iqi{Xjk) (permuting i, j and k), each 
with prior tt^tt/, with distinct g, I £ {1, 2, . . . , Q}, 

• Q{Q - 1){Q - 2) terms of the form fJ,qiiXij)fiqm{Xtk)^J-im{Xjk), each with prior 
TTqTr/Trm, with distinct q,l,m G {1, 2, . . . , Q}. 

By an old result due to Teicher (1967), the identifiability of finite mixtures of some 
family of distributions is equivalent to identifiability of finite mixtures of (multivariate) 
product distributions from this same family. In addition, identifiability of continuous 
univariate parametric mixtures is generally well understood (Teicher, 1961, 1963). Thus, 
we introduce the following assumptions. 

Assumption 1. The Q{Q + l)/2 parameter values 9qi, 1 < q < I < Q are distinct. 

Assumption 2. The family of measures A4 ~ {F{-,6) | 6* G 8} satisfies 

i) all elements F{-,9) have no point mass at 0, 
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ii) the parameters of finite mixtures of measures in A4 are identifiable, up to label 
swapping. In other words, for any integer m > 1, 

mm mm 
i—l i—1 i—1 i—1 

where 60 denotes the Dirac mass at 9. 

Remark. Note that most of the classical parametric families satisfy this assumption. In 
particular, the truncated Poisson, Gaussian and Laplace families {/(•, 9), 6 £ E^} satisfy 
Assumption 2 (see e.g., McLachlan and Peel, 2000; Teicher, 1961, 1963). 

Theorem 12. Under Assumptions 1 and 2, the parameters n, 9qi, pqi, l<q<l<Qof 
the parametric random graph mixture model with weighted edge variables are identifiable, 
up to label swapping, from the distribution of K3 . 

The previous result is not applicable to the parametric affiliation model, for which 
the set {9qi, 1 < q < I < Q} reduces to {9in, 6'out}, so Assumption 1 is violated. However, 
in this case a similar argument again yields a full identifiability result. As suggested by 
Proposition 9, we use Q nodes to identify the group priors. 

Theorem 13. Under Assumption 2, the parameters a, /3,9in,9out of the parametric af- 
filiation random graph mixture model with weighted edge variables are strictly identifiable 
from the distribution of K3 provided 9 in 7^ ^out- Once these have been identified, the 
group priors tt can further be identified, up to label swapping, from the distribution of 
Kq. 

A similar approach to that of this theorem has been successfully used by Ambroise 
and Matias (2010) to estimate the parameters of these models. They first estimated the 
sparsity parameters from the induced binary edge state model, but a procedure based 
on the preceding theorems would not require that these be distinct. 

We turn next to models with a finite number, k, of edge weights. Our primary reason 
for investigating such models is the role they play in our analysis of models with non- 
parametric conditional distributions of edge weights, in Section 4.2. Thus we limit our 
investigation to the single result we need there. 

Theorem 14. The parameters of the random graph mixture model, with K-state edge 
variables and Q > 2 latent groups, are identifiable, up to label swapping, from the dis- 
tribution of Kg, provided k > C^^^) and the K-entry vectors {Pg;}i<g<;<Q are linearly 
independent. 

Note that the condition given here on the number of edge states is likely far from 
optimal. In case Q = 2 the condition requires at least k = 3 edge states whereas we 
know from Theorem 1 that the parameters are identifiable for this Q with only k = 2 
edge states. 

4.2. The non-parametric case 

In the most general case of non-parametric distributions, our arguments for identifia- 
bility depend on binning the values of the edge variables into a finite set. We then apply 
Theorem 14 to this discretization, to obtain the following. 
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Theorem 15. The parameters {tt^, = il—Pqi)So+PqiFqi : 1 < / < Q} of the random 
graph weighted non-parametric mixture model are identifiable, up to label swapping, from 
the distribution of Kg provided the measures fXqi ,1 < q < I < Q are linearly independent. 

5. Proofs 

5.1. Method of proofs based on Kruskal's theorem 

In this section we review Kruskal's theorem and describe our technique for employing 
it in the proofs of Theorems 2 and 14. 

Kruskal's result. We first present Kruskal's result in a statistical context. Consider a 
latent random variable V with state space {1, . . . , r} and distribution given by the column 
vector V = (wi, . . . , Vr). Assume that there are three observable random variables Uj for 
j = 1,2,3, each with finite state space {l,...,Kj}. The UjS are moreover assumed to 
be independent conditional on V. Let Mj, j = 1,2,3 be the stochastic matrix of size 
r X Kj whose iih row is = ^{Uj = ■ \ V = i). Then consider the ki x K2 x tensor 
[v; Ml, M2, M3] defined by 

r 

[v; Ml, M2, M3] = v,ml (g) (g) mf. 
Thus [v: Ml, M2, M^l is a 3-dimensional array whose {s,t,u) element is 

r 

[v;Mi,M2,M3],,t,„ = ^i;^m,i(s)m2(t) m?(u) = P(C/i = s, C/2 = i, C/3 = u), 

i=l 

for any 1 < s < ki,1 < t < K2,1 < u < K3. Note that [v; Mi, M2, M3] is left unchanged 
by simultaneously permuting the rows of all the Mj and the entries of v, as this corre- 
sponds to permuting the labels of the latent classes. Knowledge of the distribution of 
{1/1,1/2,113) is equivalent to knowledge of the tensor [v; Mi, M2, M3]. 

To state Kruskal's result, we need some algebraic terminology. For a matrix M, the 
Kruskal rank of M will mean the largest number / such that every set of / rows of M are 
independent. Note that this concept would change if we replaced "row" by "column," 
but we only use the row version in this article. With the Kruskal rank of M denoted by 
rankif M, we have 

rankif M < rank M, 

and equality of rank and Kruskal rank does not hold in general. However, in the particular 
case when a matrix M of size p x q has rank p, it also has Kruskal rank p. 
The fundamental algebraic result of Kruskal is the following. 

Theorem 16. (Kruskal, 1976, 1977), (see also Rhodes, 2010) Let Ij =rankKMj. If 

Ii+l2+h>2r + 2, (4) 

then [v; Mi, M2, M3] uniquely determines v and the Mj, up to simultaneous permutation 
of the rows. In other words, the set of parameters {(v,P(?7j — ■ \ V))} is uniquely 
identified, up to label swapping, from the distribution of the random variables {1/1,1/2,1/3). 
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Now, it will be useful to note that condition (4) holds for generic choices of the Mj, 
provided the Kj are large enough to allow it. More precisely, Kruskal's condition on the 
sum of Kruskal ranks can be expressed through a Boolean combination of polynomial 
inequalities (7^) involving matrix minors in the parameters. If we show there is even a 
single choice of parameters for which Kruskal's condition is satisfied, then the algebraic 
variety of parameters for which it does not hold is a proper subvariety (defined by negat- 
ing the polynomial condition above, and so by a Boolean combination of equalities) of 
parameter space. As proper subvarieties are necessarily of Lebesgue measure zero, it 
follows that the Kruskal condition holds generically. 

Our proof strategy for showing identifiability of certain random graph mixture models 
is to embed them in the model we just described. Applying Kruskal's result to the 
embedded model, we derive partial identifiability results on the embedded model, and 
then, using details of the embedding, relate these to the original model. 

Embedding the random graph mixture model into Kruskal's context. Let k denote the 
cardinality of X, in either the binary state case or the general finite state case. 

To place the random graph mixture model in the context of Theorem 16, we define 
a composite hidden variable and three composite observed variables that refiect the 
conditional independence structure integral to Kruskal's theorem. 

For some n (to be determined), let V = (Zi, Z2, ■ ■ ■ , Z^) be the latent random vari- 
able, with state space {1, . . . , Q}", which describes the state of all n nodes collectively, 
and denote by v the corresponding vector of its probability distribution. Note that the 
entries of v are of the form tt"^ • • • ttq'' with Uq > and ~ 

The observed variables will correspond to three pairwise disjoint subsets Gi,G2,G3 
of the complete set of edges Kn ■ By choosing the Gi to have no edges in common, we 
ensure their conditional independence. 

The construction of the set of edges Gi proceeds in two steps. We begin by considering 
a small complete graph, and an associated matrix: For a subset of m nodes, we define 
a Q"* X K^^) rnatrix A, with rows indexed by assignments I e {1, • • ■ , Q}™ of states 
to these m nodes, columns indexed by the state space of the complete set of (" ) edges 
between them, and entries giving the probability of observing the specified states on all 
edges, conditioned on the specified node states. In the case k = 2, it is helpful to note 
that each column index corresponds to a different graph on the m nodes, composed of 
those edges assigned state 1. For larger k one may similarly associate to a column index 
a K-coloring of the edges of the complete graph. We therefore refer to a column index as 
a configuration. 

In the step we call the base case, we exhibit a value of m such that this matrix A 
generically has full row rank. 

Then, an extension step builds on the base case, in order to construct a larger set of 
n nodes which will be used in the application of Theorem 16. This is accomplished by 
means of (AUman et al., 2009, Lemma 16, and subsequent remark) which we paraphrase 
as follows. 
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Lemma 17. Suppose for the Q-node-state model, the number of nodes m is such that the 
Q^XK^f) matrix A of probabilities of observing configurations of Km conditioned on node 
state assignments has rank Q"^. Then with n — m? there exist pairwise disjoint subsets 
Gi,G2,G3 of the complete set of edges Kn such that for eachOi the x k^'^'^^ matrix Mi 
of probabilities of observing configurations of Gi conditioned on node state assignments 
has rank Q". 

In our applications here, we only determine that A has full row rank generically. 
Hence the Lemma only allows us to conclude that the Mi have full row rank generically, 
and hence have Kruskal rank Q" generically. 

We also note (for use in the proof of Theorems 2 and 14) that in the construction 
of the Lemma, each subset Gj is the union of m complete sets of edges each over m 
different nodes, and thus contains ™(™) edges. In particular, if m > 3, then Gi contains 
a complete graph on 3 nodes. 

Application of Kruskal's theorem to the embedded model and conclusion. Next, with 
v,Mi,M2,M3 defined by the embedding given in the previous paragraphs, we apply 
Kruskal's Theorem (Theorem 16) to the table [v; Mi, M2, M3]. Knowledge of the dis- 
tribution of the random graph mixture model over n nodes implies knowledge of this 
3-dimensional table. By our construction of the M^, condition (4) is satisfied since 
3Q" > 2Q" + 2. Thus the vector v and the matrices Mi, M2, M3 are uniquely deter- 
mined, up to simultaneous permutation of the rows. 

With these embedded parameters in hand, it is still necessary to recover the initial 
parameters of the random graph mixture model: the group proportions and the connec- 
tivity vectors. As this requires a rather detailed argument, we leave its exposition for a 
specific application. 

Finally, we note that by discretizing continuous variables, this approach to establish- 
ing identifiability may also be used in the case of continuous connectivity distributions. 

5. 2. Proof of Theorem 2 

This proof follows the strategy described in the previous section. We use the notation 

Pql = P(Xij ^\\Zi= q,Zj =1) = \-pqi. 

Base case. The initial step consists in finding a value of m such that the matrix A of 
size U X containing the probabilities of the configurations over these m nodes, 
conditional on the hidden node states, generically has full row rank. 

The condition of having full row rank can be expressed as the non- vanishing of at least 
one X Q™ minor of A. Composing the map sending {pqi} —> A with this collection of 
minors gives polynomials in the parameters of the model. To see that these polynomials 
are not identically zero, and thus are non-zero for generic parameters, it is enough to 
exhibit a single choice of the {pqi} for which the corresponding matrix A has full row 
rank. 

With this in mind, we choose to consider {pqi} of the form pqi = SqSi/{sqSi + tqti), so 
Pql = tqti/{sqSi +tqti), with Si,tj > to bc choscu later. However, since the property of 
having full row rank is unchanged under non-zero rescaling of the rows of the matrix A, 
and all entries of A are monomials with total degree ('2) in {pqi,pqi}, we may simplify 
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the entries of A by removing denominators, and consider the matrix (also called A) with 
entries in terms of Pgi = SgSi and Pgi = tqti. 

The rows of A are indexed by the composite node states X e {1, . . . , Q}"*, while its 
columns are indexed by the edge configurations {O,!}^^), Por any composite hidden 
state I € {1, . . . , Q}™ and any vertex i> G {1, . . . , m}, let T{v) € {!,... ,Q} denote the 
state of vertex v in the composite state I. With our particular choice of the parameters 
Pql, the (I, (a;jj)i<j<j<TO)-entry of A is given by 

n d„ ,m-l-d„ 

l<v<m 

where dy = "^^^^y x^w is the degree of node v in the graph associated to the configuration 
{xij)-L<i<j<m- Note that the entries in a column of A are now determined by the degree 
sequence d = {dy)\<y<rn associated to the configuration. 

In general, there is a many-to-onc correspondence of configurations to their degree 
sequences. {E.g., for m = 4 nodes, the configuration with edges (1, 2) and (3, 4) in state 
1, and that with edges (1,3) and (2,4) in state 1, both have degree sequence (1, 1, 1, 1).) 
Thus if m > 3, there will be several identical columns in A. For any degree sequence 
d = {dy)i<y<m arising from an m-node graph, let A^ denote a corresponding column of 
A. 



Now, for each vertex v € {1, . . . , m} and each g e {1, . . . , Q}, introduce an indetermi- 
nate Uy^q and a Q^-entry row vector U = (ni<j,<m Uv,i{v))i&{i,...,Q}"^ ■ For each degree 
sequence d, we have 

U^d= n '^l\v)^I(v)~'^^'^v,I(v) 

Ie{l,...,Q}"» l<v<m 

To verify this, notice that each monomial (sf/t""^"'^'t/i.ji) ' ' ' {4Z^Tj^^'^"'Um.im) ob- 
tained from multiplying out the product on the right corresponds to a choice of node 
states iy for nodes v, and hence a vector I = (ii, . . . , i^). Moreover, we obtain one such 
summand for each I. 

In order to prove that the matrix A has full row rank, it is enough to exhibit 
independent columns of A. Note, however, that independence of a set of columns {A^} is 
equivalent to the independence of the corresponding set of polynomial functions {U^d} 
in the indeterminates {Uy,q}. 

Now for a set T) of degree sequences, to prove that the polynomials {U^d}dei> are 
independent, we assume that there exist scalars aa such that 

^ adUAd = 0, (5) 

dev 

and show that necessarily all = 0. To this aim, we prove the following lemma. 

Lemma 18. Suppose Q < m. Let V be a set of degree sequences such that for each node 
V G {1, ■ ■ ■ ,m}, the set of degrees {dy \ d & T>} has cardinality at most Q. Then for 
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generic values of Si,tj, for each v and each d* G {d^ \ d € V} there exist values of the 
indeterminates {U^,q}i<q<Q that annihilate all the polynomials VA^ for d € T> except 
those for which dy = d* . 

Proof. Fix a node v and let {d^, . . . ,d^} be any set of Q distinct integers with 
{d„ I d e 2?} C . . . , d'^} C {0, 1, . . . , TO - 1}. 

Let M be the Q x Q matrix with ith row (sf t™^^^''', . . . , s^'tg"^"'''). Since all 
the integers are different, the matrix M has full row rank for generic choices of Si,tj. 
(One way to see this is to consider a to x m Vandermonde matrix, with {k, /)-entry {ui)^ . 
Choosing distinct values of it; this has full rank, and thus the Qxm submatrix composed 
of rows with indices {d'} has rank Q. But then Q of the columns can be chosen so that 
the Q X Q submatrix has full rank. Letting the Si be the values of ui in these columns, 
and tj — 1, gives one choice for which the matrix M has full rank.) 

Note d* = d^ for some fc, and let e/j be the Q-entry vector of all zeros except for a 1 
in the fcth position. Then for generic Si , tj , the equation 

M(C/„,i,...,f/„,Q)^ = efc 

admits a unique solution, one that corresponds to the above-mentioned choice of inde- 
terminates {Uy^q\l<q<Q. □ 

Now consider the following collection 

m — 1 

T) = . . . , dm) \ dv £ {1,2, . . . , Q} for w < to — 1, and if d„ is even 

v=l 

then dm G {0, 2, 4, . . . , 2Q - 2}, otherwise d^ € {1, 3, 5, . . . , 2Q - 

Note that T) has Q™ elements and satisfies the assumption of Lemma 18 on the number 
of different values per coordinate. Moreover, if we establish, as we do below, that its 
elements are realizable as degree sequences of graphs over to nodes, then by choosing 
one column of A associated to each degree sequence in I?, we obtain a collection of Q™ 
different columns of A. These columns are independent since for each sequence d* e 2? 
by Lemma 18 we can choose values of the indeterminates {Uy^q}i<.y<m,i<q<Q such that 
all polynomials UAd vanish, except UAd*, leading to Od* = in equation (5). 

That each sequence d € I? is realizable as a degree sequence of a graph over to nodes 
follows from a result of Erdos and Gallai (1961) (see also Bergc, 1976, Chapter 6, Theorem 
6). Reordering the entries of d so that di > d2 > ■ . ■ > d„i, sl necessary and sufficient 
condition for a sequence to be realizable by such a graph is that for 1 < fc < to — 1, 

k m 

^ d„ < fc(fc - 1) + min{fc, d^}. (6) 

v—1 v—k-\-l 

From the definition of d G X>, with coordinates reordered, it is easy to see that for any 
l<A;<m— l,we have 



k m 

< (fc- 1)Q+ (2Q- 1) and ^ min{A;, dy} >m~k. 

v—l v — k'\-l 
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Thus, for (6) to be satisfied, it is enough that for any l<fc<m— 1, we have 

-k'^ + {Q + 2)k + Q - I < m. 



But for TO sufficently large 



max \-k^ + (Q + 2)k} = 

l<fe<m-l (Q+l)(Q+3) 



Q+2 ' ^ 
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if Q is even, 
if Q is odd. 



Thus, inequahty (6) is satisfied as soon as 




if Q is even, 
if Q is odd. 

This concludes the proof of the base case. 

The extension step explained in Section 5.1 then applies, so that with n — m?, 
Kruskal's Theorem may be applied to identify, up to simultaneous row permutation, v. 
Ml, M2, and M3 as defined in that section. 

Conclusion. The entries of v obtained via Kruskal's theorem applied to the embedded 
model are of the form tt"^ • • • ttq'' with = n, while the entries of the Mi contain 

information on the pqi . Although the ordering of the rows of the Mi is arbitrary, crucially 
we do know how the rows of Ali are paired with the entries of v. 

By focusing on one of the matrices, say Mi, and adding appropriate columns to 
marginalize to a single edge variable {e.g., all columns for configurations with X12 = 1), 
we recover the set of values {pqi}i<q<i<Q, but without order. However, if row k of Mi 
corresponds to the unknown node states I, then performing such marginalizations for 
each of the 3 edges of a complete graph C on 3 nodes contained in Gi recovers the set 

Rk = {Pqi I for some edge {v,w) £ C, {I{v),I{w)} = {q,l} }. 

By considering the cardinalities of the sets Rk in the generic case of all Pqi distinct, we 
can now determine individual parameters. 

Consider first those k for which Rk has one element. There are exactly Q of these, 
arising from all 3 nodes being in the same group. Thus for such k, Rk — {Pqq} and 
Vk — TTq. Choosing an arbitrary labeling, we have determined all TTq and Pgq. 

Next consider those k for which the Rk has two elements. These arise from 2 nodes 
being in the same group, with the other node in a different group, so Rk — {Pqq,Pqi} for 
some I ^ q. However, having already determined the Pqq and since generically the Pqi 
are distinct, we can find exactly two such ki and ^2 of the form Rk^ = {pqq,Pqi\ and 
Rk2 = {PiuPqi}- Thus, we can also determine pqi for q^l. 

Finally, note that all generic aspects of this argument, in the base case and the 
requirement that the parameters Pqi be distinct, concern only the pqi. Thus if the group 
proportions Hq are fixed to any specific values, the theorem remains valid. □ 
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5.3. Proofs relying on moment equations 

Proof of Proposition 3. Focusing on Q + 1 nodes, let Z = (Zi, . . . , Zg+i) denote the 
composite node random variable, and z ~ (zi, . . . , zg+i) any realization of Z. Note that 



M^)= E n 

ze{i,...,Q}Q+i \i<fe<Q+i 

- E f n 



I E I n {X- X,,) I Z - 

. 1<!<J<Q + 1 



Jl {X~¥.{X,j\Z, = z,,Z,^Zj)). 



26{1,...,Q}Q + 1 \i<fe<Q+i / l<i<j<Q+l 

since conditioned on Z — z, the edge variables are independent. Now since there are 
Q + 1 nodes and only Q groups, for each term in the sum there is some Zi — Zj. Since 

X - E{X,,\Z, = z, = Zj=Zj)=X~ a, 

each term in the sum vanishes at X = a, so UQ{a) = 0. 

Likewise, 



Vq{x,y)= J2 



n 



ze{i,....Q}«+i \i<fe<Q+i 

\ \ l<j<Q / l<i<j<Q 



Z = z\ . 



But 



l<i<Q / l<i<j<Q 



Z = z 



= X + (Q - l)y - ^ i^KQ+i) I = ^Q+i = ^Q+i) 

V 1<*<Q , 

[] (X - E{X,j I = z„ = z,)) . 
i<j<i<Q 

Letting X = a, one of the factors X — E{Xij \ Zi ~ 2^, Zj = Zj) will vanish for any z 
except possibly those with the Zi, 1 < i < Q, distinct. But in that case, zg+i — Zi for 
exactly one value of z S {1, . . . , Q}i so that the first factor becomes 

a + {Q-l)Y -{Q-l)l3~a. 

Thus in addition setting Y = (3 ensures each summand is zero, so Vq(q;, (3) — 0. 
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Finally, the coefficient of Y in VQ{a, Y) is the product of Q — 1 and 



, l<i<j<Q 



ze{i,...,Q}« \i<fc<Q / i<i<i<Q 



But rii<i<j<Q ''^('^ ^ I = -^ii = ^j) vanishes for all z except possibly for those 
in which all z^, 1 < * < Q, are distinct, in which case it takes the value {a — I3)^'^\ So 
the coefficent becomes 



\l<fe<Q / 

This is zero if, and only if, a = /3. □ 



Proof of Theorem 4- Since a is a real root of the cubic polynomial U2{X), to show a is 
uniquely identifiable it is enough to show that -j^U2{X) > 0. But 

-^C/2(X) = - 6miX + 3m2 = 3 UX^ - rmf + (ma - m?)) . 
dX 

But 7712 ^ > because, using the Cauchy-Schwarz inequality, 
77Z2 = ¥.{X,,X,k) = E[E(X,j|Z,)E(X,fc|Z,)] 

= nnX^J\Z,f] > [E(E(X,,|Z0)]2 = 777?. 

With a identified, since a ^ /3, we may uniquely recover /3 as the root of the linear 
polynomial V2{a,Y) with nonzero leading coefficient. 

□ 

Proof of Theorem 6. Using equation (1) to eliminate a from equations (3) and (2) re- 
spectively, gives two equations 

R[P) = a/33 + ^^2 ^ ^ ^ ^ 

S{p) = Ap^ + B/3 + C ^0, 



where 



a 
b 
c 
d 



= -2s^ + 3S2S3 - S3, 

= 3777l(s^ - 2S2S3 + S3), 

= 3777?S3(S2 - 1), 

= "if S3 - 7713 S^, 



A 

audi B 



S3 



C = 



-27r7i(s3 - si), 

771?S3 - 77I2S2. 



To understand the degrees of these polynomials we need the following. 
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Lemma 19. Suppose tt e [0, 1]*^ with X]^=i "^q ~ ^■ 

i) If TTq > for at least two values of q, then a ^ 0. 

ii) A = Q if, and only if, tt is uniform on its support. 

Proof. To establish claim i), first observe that < S2 < 1. Moreover, since Sg < S2S4 
by the Cauchy-Schwarz inequality, and S4 < by comparing terms (since at least two 

TTq > 0), we have S3 < s^^^. If — + 3s2S3 — S3 = 0, then 

3/2 ^ 2sl 
So > S3 = 



3S2-1' 

where the denominator must be positive. Thus 



1 > 



2s 



3/2 
2 



3s2 - r 

so 

> 2s2^^ - 3S2 + 1. 

However, the function x h- > 2x^^^ — 3x + 1 is positive on (0, 1), so this is a contradiction. 
Turning to claim ii), we have ^ = S3 — s| and by the Cauchy-Schwarz inequality, S2 = 

(Lg'^g^^'^y^)^ < S3, with equality if, and only if, (ttJ^^, . . . , ttq''^) = X{Trl^^ , . . . ,Tr^^) 
for some value A G M. This can only occur if on its support tt is uniform. □ 

Returning to the proof of Theorem 6, if tt is not uniform, we thus have A ^ 
and dividing the polynomial R{I3) by S{/3) produces a linear remainder T{j3), which is 
calculated to be 

T(/3) = [(m2 - m2)(s3 - 3S3S2 + 2sl)p 

s^ - S3 

+(s3 - S2S3)mJ + (s2 - S3)m2TOi + (S3S2 - s^)m3] . 

Since any common zero of R{(3) and S{f5) must also be a zero of T{/3), we can recover 
the parameters /3 and a via the rational formulas 



^ _ (S3 - S2S3)mf + (s^ - S3)m2TOi + (S3S2 - s^)to3 

(mf - m2)(2s^ - 3S3S2 + S3) 



^^ mi + (s2~l)/3 ^ (8) 

S2 

Note that a calculation shows 

ml-m2^{a- (3f{sl-S3), (9) 

which, since ^4 7^ 0, is only zero in the trivial case of a = /3. Otherwise, since 2s2 — 
3S3S2 + S3 = —a 7^ by part i) of Lemma 19, the formulas (7) and (8) are valid. 

Equation (9), together with part ii) of Lemma 19 further shows that if m2 7^ mf, 
then TT is not uniform. 
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If 1712 = ml, then tt is uniform, and S{/3) is identically zero. However, in this case 
the coefficients of 



simplify to 



Thus 



b — —3mi, c — 3m\, 

Qm^ — 7713 3 TO? — TO3 
= —TO? H ± . 

1-Q ' 1-Q 



i3 



l-Q ' 
which has a unique real root 



(5 — mi + 



s \ 1/3 

TO]^ — TO3 ^ 



Q-1 

The parameter a can then be found by formula (8) 



□ 



Proof of Proposition 9. First, note that the distribution of Kn may be parameterized 
using the elementary symmetric polynomials evaluated at the {Trq}i<q<Q, instead of 
the values {T^q}i<q<Q- Indeed, the affiliation model distribution only involves the tt^s 
through the symmetric expressions 

"91 ■ ■ • "gs' 

gi ■ - ,9s , 

with s < Q and X]fc<s ~ ^^'^ these sums may be expressed as polynomials in the 
{(Ti(7ri, . . . ,7rQ)}i<i<„. Thus for identifiability of the {tt^} from the distribution of Kn, 
it is necessary that the {tt^} be identifiable from the {(7^(711, . . . ,7rQ)}i<i<„. Note also 
that ai{TTi, . . . , ttq) — X]^=i '^i — ^ carries no information on the tt^s that is not already 
known. 

Now if n < Q, identifying Q — 1 independent choices of the tt^ from the values of 
n — 1 continuous functions of those nq is impossible. □ 

Lemma 20. For the random graph affiliation model on Q nodes, with binary edge state 
variables, uniform group priors, and connectivities a ^ /3, the moment inequality TO41 > 
mf holds. 

Proof. Note 

TO41 =E[E(Xi2X23|Zl,^3)E(X34X4i|Zi,Z3)] =E[E(Xi2X23|Zl,Z3)'] 

> (E[E{Xi2X23\Zi, Z3)]f ^ mj. 
However, equality occurs above only if E(Xi2X23|^i, ^3) is constant. But 

E(Xi2X23|^l = i = Z3) = + ^^/?': 
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so the difference of these expectations is (a — /Q ^ 0. Thus mn > m^. 

A similar argument that m2 > m\ was given in the proof of Theorem 4, so the claim 
is estabhshed. □ 

5.4- Proofs for the continuous parametric model 

Proof of Theorem 12. With pq^ = 1 — pqi, the distribution of {Xij, Xik, Xjk) is given by 
the mixture 

T^qT^eT^m[PqiS(){Xij) + PqpF [Xij , 9 ql)] X [pqm^oiXik) + PqmF {X^k, 6 qm)] 

l<q,l,m<Q 

X [pemSo{Xjk) + P£rnF{Xjk,Oim)]. (10) 

Since the distributions F{-,9) have no point masses at by Assumption 2, the family 
A4 U {i5o} has identifiable parameters for finite mixtures, so Theorem 1 of Teichcr (1967) 
applies to it. Thus multiplying out the terms of the mixture in (10) to view it as a 
mixture of products from Ai U {Sq}, and noting that by Assumption 1 certain of the 
components arise from unique choices of q, i, m we can identify the terms of the form 

'^q '^mPq^PqmPirn F{Xij,6qi)F{Xik,9q,-n)F{Xjk,9ern), 

and the vectors in 

C = {(■nqTTl>TTmPqtPqmPtm]0qt,9qra,9hn) \ ^ < q,£,m < Q}, 

but only as an unordered set. But by Assumption 1, there are only Q vectors in this set 
for which the last entries {9qi,9qm,9em) are all equal. Indeed, these entries are of the 
form {9qq,9qq,9qq) for somc 1 < 9 < Q, since the case where these entries would be of 
the form {9qi, 9qi, 9qg) for some t is not possible. Thus the 9qq ioi 1 < q <Q may be 
identified as well as the corresponding weights {TTqPgq)^ , or equivalently the values i^qPqq. 

Now, among the vectors in C, exactly 3Q{Q — 1) of them have two of the last three 
entries equal. These entries are, up to order, of the form {9qq,9q{,9qi), for any q ^ 
i. Thus we obtain the set {{'!rqTriPqiPqq;9qq,9qi,9qi)}i<q^i<Q, without regard to order. 
Since we already identified the pairs {T^qPqq,9qq), we may take the ratio between the 
weights TTqTTep'^^Pqq aud TTqPqq to rccovcr thc valucs TTqTTip'^^. Thus wc identify the set 

{{TTqTTePqf,; 9qq,9ql,9ql)}l<q<:l<Q. 

Among these vectors, we can match the ones whose two last entries are equal, namely 
those of the form {'!Tq'!Tip'^g;9qq,9qi,9qi) with {'!Tq'!r£p'^f,9ii,9qi,9qi). This enables us to 
recover the values 9qe, for 1 < q,£ < Q. 

By marginalizing the distribution of {Xij, Xik, Xjk), we also have the distribution of 
a single edge variable Xij , 

T^qT^eiPqeSoiXij) +PqiF{Xij,9qi)]. (11) 

i<q,e<Q 

and thus by our hypotheses can also identify {{T^qT^iPqi, Oqi)}i<q<i<Q, without order. But 
as the 9qi have already been identified, we may use this to match iTqiripqi with tt^tt^p^^ 
and thus recover pqi from the ratio. From iTqPqq and Pqq we can then recover tt,. 
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Thus, all parameters of the model are identified, up to permutation on the group 
labels. □ 



Proof of Theorem 13. From the distribution of K^,, we can distinguish (a,0in) from 
^out) as follows: The distribution of is the mixture of either 4 (when Q = 2) 
or 5 (when Q > 3) different 3-dimensional components. Since the distributions F{-,9) 
do not have point masses at by Assumption 2, we can identify from this mixture that 
part with no such Dirac masses in it, which is the mixture 

Q 

o? ( ^i) ^(-^ ^in) ® ^ F(-, 

q=l 

l<q^i<Q 
l<q^i<Q 

+ Y ve'^m)F{-,e^^t)(S)F{-,eont)®F{-,eont), 

q,i,m distinct 

where the last term appears only when Q > 3. 

By Theorem 1 of Teicher (1967) and Assumption 2, this 3-dimensional mixture has 
identifiable parameters, up to label swapping issues. At most two terms in this mixture 
have the same measure F in each coordinate. The three remaining terms have two 
coordinates which are equal, involving 0out, and one different, involving Oi^. Thus we can 
distinguish between 9^ and 6'out- 

We may also determine a^iJ2q''^q) as the weight of F{-,9in) <8) F{-,9in) 'S' F{-,9in)- 
Similarly from the Sq <S) F{-,9in) ® -P(',^in) term in the full mixture, we may recover 
the weight (1 — ct)oi'^{J2q'^q)- Summing these two weights yields ct'^ij^q'^q)^ ^^-^ then 
dividing the first by this, we recover a. 

The parameter /? is similarly recovered from the weights of F{-,9out) ^ F{-,9out) 'S) 
F{-,9in) and So F(-, 9out) ^ i^(-, ^in). 

Next we consider the distribution of Kn for various n. This is a mixture of many 
different (j) -dimensional components. As above, we can identify up to label swapping the 
components with no Sq factors in this mixture. But as we already know the value of 9in, 
we can identify the term i^iKi^jKuF^Xij, 9in) in this mixture, and thus its corresponding 
prior a" tt^. Since a has been previously identified, this uniquely determines tt^. 
Note that using the distribution of Kq , we can obtain the distribution of each Kn with 
n< Q and thus the values {J2q'^q}n<Q- 

By the Newton identities, these values determine the values of elementary symmetric 
polynomials {cr„(7ri, . . . , TTQ)}n<Q- These, in turn, are (up to sign) the coefficients of the 
monic polynomial whose roots (with multiplicities) are precisely {Trq}i<q<Q- Thus the 
node priors are determined, up to order. 

□ 
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5. 5. Proof of Theorem I4 

The proof fohows the strategy described in Section 5.1. We thus proceed with a base 
case, an extension step, and a conclusion. 

Base case. We consider a subset £ of the set of ah edges over m vertices, with m and 
£ to be chosen later. Let A be the x k''^' matrix containing the probabilities of the 
clumped random variable Y = (Xe)ee£ with state space {1, . . . , k}'^', conditional on the 
hidden states of the m vertices. 

Let I £ {!,...; Q}™ be a vector specifying particular states of all the node variables. 
For each edge e G 5, the endpoints are in some set of hidden states {q,l}, which we 
denote by 1(e). The (I, (a;e)egf )-entry of the matrix A is then given by 

K 

ee£ k = l 

where 1a is the indicator function for a set A. 

For each edge e in the graph, we introduce k indeterminates, te,i, . . . , ie,K- We create 
a K'^l-element column vector t indexed by the states of the clumped variable Y, whose 
(xe)ee£-th entry is given by 

ee£ k=l 

Then the Xth entry of the (5™-entry vector At is the polynomial function 

K 

fx= nnW'=)(^)^^.fc}'"^' =^(PI('^)(l)*^^l + •••+PI(e)('*)^«.-)■ 

Independence of the rows of A is equivalent to the independence of the polynomials 
{/x}ie{i,...,Q}'" ■ Thus, suppose that we have 

^ai/i = 0, (12) 
I 

and let us show then that every aj must be 0. 

For a specific e e f , and any choice {q, 1} with 1 < <? < ^ < Q, one can choose a 
point tg ;} = (te,i, • ■ • , te,^) € I^'^ in the zero set of all the polynomial functions fx in 
(12), except those with Z(e) — {q, I}. To see this, let M be the C^^^) x k matrix whose 
{g, Z}th row is given by the vector p^; = {pqi{l), . . . ,pqi{K)). M has full row rank since 
its rows are independent by assumption. Thus there is a solution tg ;} to 

where e^^ ;} is the vector of size {'^2^) with zero entries, except the {q, ^}th which is 
equal to 1. The independence assumption also implies k > {'^2^)- 

Note that in this construction we have only specified group assignments to two nodes 
up to node permutation. Thus if the {g, /} row of M is related to an edge e = 
because 1(e) — {q, I}, we may have that either i is in state q and j is in state I, or i is in 
state I and j is in state q. 
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By evaluating the fx at tg /} for many edges e and choices of node states {g,/}, 
we can annihilate all the polynomials fx except those satisfying specific constraints on 
the node states. More precisely, we can make vanish all the fx except those for which 
I satisfies the condition that for some subset of edges C f and some sequence of 
unordered node assignments ({qe, /e})ee£' we have 

J€ fl S[e-{q,,h}). (13) 

where S (e; {q„ h}) = {I G {1, . . . , Q}" 1 1(e) = {ge, le}}- 

To conclude that each ax = in equation (12), it is enough to construct for every 
X e {1, . . . , Q}™ a set as in (13) containing only I. 

In fact, this can be achieved with only m = 3 vertices and the full set of edges 
£ = {(1,2), (1,3), (2,3)}. Indeed, up to permutation of the nodes and of the labels of 
the groups, X can take only three different values, namely (1, 1, 1), (1, 1, 2) and (1, 2, 3). 
Using a node assignment on the edges in £' = {(1, 2), (2, 3)}, we get 

{(1,1,1)} = 5((l,2);{l,l})n5((2,3);{l,l}) 
{(1,1,2)} = 5((l,2);{l,l})n5((2,3);{l,2}) 
{(1,2,3)} 5((l,2);{l,2})n5((2,3);{2,3}). 

Thus, we proved the following lemma. 

Lemma 21. With E the complete set of edges over m = 3 vertices, the Q'^ x matrix 
A containing the probabilities of the clumped variable Y — (Xe)eG£; conditional on the 
hidden states Z = [Zi, Z2, Z3) G {1, . . . , Q}^ has full row rank , provided the n-entry 
vectors {PqilKqKKQ are linearly independent. 

Conclusion of the proof. The Lemma provides the base case, with the extension step of 
Section 5.1 then applying. Thus with n = = 9 nodes, Kruskal's Theorem may be 
applied to identify, up to simultaneous row permutation, v. Mi, M2, and M3 as defined 
in that section. 

The rest of the proof follows the same lines as the conclusion in the proof of Theorem 2, 
replacing the numbers Pqi by the vectors p^; and noting that these vectors are assumed 
to be linearly independent. 

5.6. Proof of Theorem 15 

For convenience, we present the argument assuming the state space of the fiqi is a 
subset of M. The more general situation of a multidimensional state space can be handled 
similarly, along the lines of the proof of Theorem 9 of Allnian ct al. (2009). 

Let Mqi denote the c.d.f. of fiqi = (1 — Pqi)So +PqiFqi. Since the measures {fiqi \ 1 < 
q < I < Q} are assumed to be linearly independent, so are the functions {Mqi \ 1 < q < 
I < Q}. Applying Lemma 17 of AUman et al. (2009) to this set of functions, there exists 
some K e N and cutpoints mi < U2 < • • ■ < u^-i such that the vectors 

{iMqiiui),Mqi{u2),...,Mqi{u^_i),l) \ I < q < I < Q} 
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are independent. Note k > C^^^)- Also by adding additional outpoints if necessary, and 
thereby increasing k, we may assume that among the Ui are any specific real numbers 
we like. 

The independence of the above vectors is equivalent to the independence of the vectors 
{Mgi \l<q<l<Q}, where 

Mgi = {Mqiiui),Mgi{u2) - Mgiiui),.. . ,M,,(w«,_i) - M,,K_2), 1 - Mqiiu^-i)) . 

Note that the kth. entry of Mgi is simply the probability that a variable with distribution 
fj,qi takes values in the intervals Ik = (wfc-i, itfc] (with the convention that uq = —oo, u^, = 
oo). To formalize this, let 

fc=i 

be the random variable with state space {1, 2, . . . , k} indicating the interval in which the 
value of Xij lies. Thus, conditional on Zi = q, Zj = I, the random variables Xij and Yij 
have respective c.d.f.s Mqi and Mqi. 

Now from the distribution of the continuous random graph mixture model on Kg, 
with edge variables (-'fij)i<i<j<9, by binning the values of the 36 edge variables into sets 
of the form rii<i<j<9 ^kij with 1 < kij < k, we obtain the distribution for the discrete 
edge variables (i^ij)i<i<j<9 of a random graph mixture model with the same group priors 
on the nodes, and with mixture components built from the distributions Mqi associated 
to ^qi . By Theorem 14, the parameters of the discrete model are identifiable, up to label 
swapping. Imposing an arbitrary labeling, we have identified the node group priors tt^, 
1 < 9 ^ Q, and for each pair of groups q < I the vector Mqi. By summing entries of Mqi, 
we obtain values of Mqi{uk) for /c = 1, 2, . . . , k — 1. Since we may additionally determine 
Mqi(t) for any real number t by including it as a cutpoint, Mqi, and hence fiqi, is uniquely 
determined. 
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